Nabil Salehiyan

Multivariate Analysis Cookbook

The University of Texas at Dallas

Dr. Herve Abdi

Nabil Salehiyan

Principal Component Analysis

Principal Component Analysis is a method of data analysis that eliminates noise, finds components (orthogonal to

each other) that explain the variables, shows similarity between variables (through angles), shows similarity between

observations (through distance), and compresses information to show only what matters. The goal is to see which

components are the main contributors to the phenomenon’s we observe. For example, what aspect of cell phones

(such as camera megapixels) predict things such as the price range they fall in.

Data

The data I was given in my analysis was of mobile phones. The variables were grouped by price range (low, med,

high, vhigh). These variables are: sc_h: screen height

sc_w: screen width

fc: front camera megapixels

pc: primary camera megapixels

px_height: pixel resolution height

px_width: pixel resolution width

m_dep: mobile depth in centimeters

int_memory: internal memory in GB

Nabil Salehiyan

Methods

In the scree plot we see that there are 5 or 6 components that could be used to describe our data. For simplicity I will

look at the highest 2 dimensions.

In this barplot we see that there are 4 variables that make up the first dimension. Sc_w, sc_h, pc, and fc.

Nabil Salehiyan

For dimension 2, we see that the main contributions are made by px_width and px_height.

In a bootstrap analysis I saw that some of the gray variables are also contributors to these dimensions which shows

that in the raw data- they did not contribute due to chance. If this analysis was done with an infinite population and

with replacement, these other variables would could be trusted to replicate in dimensions 1 and 2. The only two that

did not contribute (in raw OR bootstrap) was n_cores and mobile_wt.

Here in the variables loading as inertia graph we can confirm the variables that make up the first two dimensions.

Sc_w/sc_h & pc/fc make up dimension 1 and px_width/px_height make up dimension 2. We also see that n_cores

and int_memory is in the middle- they also did not show up as significant contributors in the bar graphs. To

investigate whether these variables are true averages or fall in another dimension, we must look towards the circle of

correlation.

Nabil Salehiyan

In this circle of correlation, we see that n_cores and int_memory are still in the center and nowhere near the edge of

the circle which suggests that they lie in a different dimension. Also, if we look at the variable loadings as

correlation data that was provided, we see that the highest loadings for int_memory is in the fourth dimension and

the highest for n_cores lies in the fifth dimension.

Furthermore, we can infer some correlations between the variables by looking at the arrows. We see that fc and sc_h

are almost orthogonal to each other which should mean they do not relate to one another. We also see that pc and

sc_w are going in almost opposite directions, which suggests they have a large negative correlation. Lastly, we see

some positive correlations between sc_w and sc_w, fc and pc, as well as pc_width and px_height- because their

arrows are going in the same direction. We must keep in mind that because of the distance these variables have from

edge of the circle, we cannot reliably approximate their correlation. To confirm these inferences, I looked at the

correlation matrix heat map.

Nabil Salehiyan

The heatmap confirms all the speculations I made. Pc & fc have a positive correlation of 62, px_height and

px_width have a positive correlation of 48, and sc_h and sc_w have a positive correlation of 48. We also see a

negative correlation of -19 between pc and sc_w and we see no correlation between fc and screen height (0).

We can now move onto the observation groups which were price ranges of high, med, low, and vhigh.

We see here that there seems to be an inverse relationship between the colored dots. I assumed that there was a

negative correlation between low and high price, as well as vhigh and medium price. Also, when we look at the

variables projected onto the observation groups, it seems like things such as fc and pc relate to low price

(interesting). Also, screen size falls in the vhigh price category, and pixel resolution falls between high and medium

prices. To investigate the reliability of this graph we must look towards the tolerance intervals and confidence

interval graph.

Nabil Salehiyan

Here we see an overlap between the tolerance intervals. This shows that we cannot reliably assign our observations

to groups. Meaning we can’t say for certain that fc and pc predict low priced phones or that screen size predicts high

prices.

Lastly, we do not see an overlap between the confidence intervals. This suggests that the averages of these groups do

in fact differ. So, although we have low accuracy in group assignment, we can reliably differ the average prices of

these groups from one another.

Summary

Through this data visualization we can infer that we cannot reliably assign cell phones with the groups of price

ranges for dimension 1 & 2. Some features such as screen size can be attributed to higher prices but in general, they

vary too much to say which variable predicts price. Lastly, group means are reliably different for dimension 1 & 2.

Nabil Salehiyan

Correspondence Analysis

Correspondence analysis is a method of multivariate data analysis that analyzes categorical data. This data is stored

in a contingency table. CA transforms the data into rows and columns and the factor scores for them respectively.

These scores can be visualized in the same map since they share the same variance. After this has been done, we

assign a mass to the rows and a weight to the columns. The greater the mass, the higher the importance- and the

same for the weights. It is best to use CA when your data includes at least 2 rows and 2 columns and are searching

for similarity between your data, and the strengths of the similarity.

Data

In my data set, I had different brands of sausages and different sensory feelings that tasters rated after eating the

sausages. The different sausages are: Duby, Chimex, Capistrano, Bafar, and Alpino. The different sensory reports

range from well-being, relaxed, melancholic, to happy, salivating, and more.

The scree plot showed that we can look at up to 4 dimensions but for simplicity, I will focus on 2 dimensions.

Methods

Nabil Salehiyan

The contribution barplot for the rows (sausages), for dimension 1 are shown above. It is observed that Duby and

Bafar are the significant contributors for this dimension. When we look at the bootstrapped row component for

dimension 1- the contributions remain the same. This means that this experiment can be reliably replicated and

Duby/Bafar will remain the significant contributors for the first dimension.

For dimension 2 (rows)- we see that Chimex, Capistrano, and Bafar make up the significant contributions. When

boostrapped, it is observed that Bafar is the only significant contributor for the second dimension. This suggests that

when this experiment is replicated with an infinite population, Bafar will be the main contributor

for this dimension.

Nabil Salehiyan

The graph for the factor scores confirms that for the rows- Chimex and Capistrano make up the 2

dimension and

Duby/Bafar make up the first dimension. Alpino is not a significant contributor- we must look at the cosine circle to

see if this is an average sausage or belongs in another dimension.

Looking at the cosine circle, we see that most of the sausages are near the edge of the circle- which tell us that they

are well explained by 2 dimensions. Also, we see that Alpino is near the center which tells me that it belongs in

another dimension.

Nabil Salehiyan

For the columns (sensory feelings), we see that the positive side of dimension 1 is made up of Relaxed, Melancholic,

Soothed, and Salivating. The negative side of dimension 1 is made of Guilty, Well.being, Impressed, and Joy. These

findings are reflected in the factor score map beside it. When a bootstrap analysis is ran, Melancholic remains the

only significant contributor of dimension 1- suggesting that the other significant contributors likely depended upon

the sample size.

For dimension 2 (columns), the positive side is made up of Well.being, Romantic, and Melancholic. The negative

side is made of Famished and Salivating. These are reflected in the column map beside it. When a boostrap analysis

is ran on the data, Romantic and Famished remain the only significant contributors- suggesting that the significance

of the raw data contributors was probably due to sample size.

Nabil Salehiyan

When looking at the factor scores for the column (feelings0, we see a lot of points that seem to be in the center. The

points seen in the contribution bar plot are reflected in this graph. For example, romantic and maleanchoic are in the

positive end of dimension 2. But, I want to see if the other points are true averages or if they belong in a different

dimension. For that, we move onto the cosine circle.

In the cosine circle we see that most of the sensory feelings are close to or touching the edge of the circle- which

shows they are well explained by these two dimensions. Thirsty, Sad, and Refreshed are the only ones that are not

near the edge, which tells me they are better explained by another dimension.

Nabil Salehiyan

The final thing I will look at from the analysis is the rows and columns cosine circle and relate it to the chi squared

residuals graph. It is obvious that these two graphs say the same thing. For example, the chi squared graph tells us

that Bafar is going to be close to the Famished point (big blue circle) and far from the Romantic point (big red

circle). When we compare this to the cosine circle we see that this indeed is true. This is the same for all rows and

columns.

Summary

To conclude, it is observed that two dimensions well explains the relationship between sausages and the sensory

feelings reported by tasters. The cosine circle tells us that Chimex and Capistrano are very similar to each other as

far as the columns go, judging by their distance. Alpino and Bafar are also close to eachother but we cannot rely on

that due to the distance Alpino has from the edge of the circle. Duby is the most different sausage in relation to the

other sausages. As far as taste, Duby gives the sense of Guilty, Depresssed, and joy. Bafar gives the sense of

Salivating, Energetic, Soothed, and Famished. Chimex and Capistrano give the feeling of Romantic, Well.being, and

Melancholic.

Multiple Correspondence Analysis

Multiple correspondence analysis is a method for analyzing multiple qualitative variables. If our dataset has

quantitative variables, we must translate them to qualitative variables. Much is like CA and PCA such as how we

look at the chi-squared heat map. The difference is since they are both nominal variables, we must interpret columns

that represent different levels of one variable. We can do this by disjunctively coding our quantitative variables into

1’s and 0’s. One variable will now represent a set of columns. The variables are combined into new components and

the amount of each variable in the component is the loadings.

Nabil Salehiyan

Data

In the MHL study, researchers wanted to investigate what factors impact mental health literacy in the college-aged

population. The variables were race (“RaceCat”), age (“AgeCat”), whether a student has taken a clinical course

(“ClinCourse”), major (“MajorCat”), and the students experience with mental health. The supplementary variables

are gender. Our observations will be the MHL score (“low, med, high”).

Methods

To begin, we look at the MCA scree plot and decide how many dimensions we want to observe. For simplicity, I’m

focusing on the first two dimensions. Although if we wanted to get more out of this study, I think we could look at

up to four dimensions.

Nabil Salehiyan

When looking at the factor score map, we can extract out the two most important contributions based on how far they

have passed the dimensions threshold. We see that the Major makes up the positive end of dimension 1 and 2, and

ClinCourse makes up the positive end of the first dimension.

Here in the bar plot on the left, we can confirm that dimension 1 is made up of Major and ClinCourse. For

dimension 2, the bar plot tells us that Major and Race are the primary contributors. This shows that the threshold for

the factor score map could include race.

Nabil Salehiyan

When a pseudo bootstrap analysis is run on the first two dimensions for these variables, we can see that if this

experiment was replicated with an infinite population, all variables except for Experience would be relevant for the

first dimension. Major, Race, and Age would be significant contributors for the second dimension.

The Chi Squared heat map are coefficients of correlation. The highest correlation between our variables is between

ClinCourse and Major (44), which suggests that a students’ major is positively correlated whether they will take a

clinical course. This map also suggests there’s a very low positive correlation (4) between a students’ level of

experience and whether they have taken a clinical course (“Experience”, “ClinCourse”).

The factor scores with confidence intervals for our observations show that the mean value for our variables are in

reliably separate groups (“low”, “med”, “high”), but when compared to the overlapping tolerance interval- we can’t

confidently assign a variable to a price range.

Nabil Salehiyan

The next thing we want to look at are the factor scores for our important variables. We already saw above that

“MajorCat” and “ClinCourse” are significant contributors for our dimensions. Here we can dissect more and see which

levels of these contribute to which dimension. It seems like the negative end of the first dimension is made up of

“MajorCat.STEM”, “MajorCat.Hu/So” (Humanities and Social Sciences), “MajorCat.Econ” and

“ClinCourse.NoClin”. The positive end of the first dimension: “MajorCat.Psyc”, “MajorCat.Edu”, and

“ClinCourse.Clin”, and “MajorCat.ApMed”. The second dimension is made up of “MajorCat.Econ”,

“MajorCat.ApMed” (applied health science), and “ClinCourse.NoClin” in the negatives and “MajorCat.Educ”,

“ClinCourse.Clin”, “MajorCat.Psych”, “MajorCat.Hu/So”, and “MajorCat.Stem” in the positives. This tells us which

level of majors makes up which dimension. We might be able to use this to find out which level MHL score they make

up. Its also obvious that some variables were positive in one dimension but negative on the other. To see which variable

is best represented in which dimension we look towards the bar graphs.

Here we can see how well each dimension tells us information about our variable. For example, the “ClinCourse”

variables along with a few “MajorCat”, are best represented in the first dimension. In the second dimension, “AgeCat”

and “RaceCat.NaAm” and “RaceCat.Wh/Ca” are significant contributors for the second dimension. It is clearer here

the relationship between the variables which dimension was not clear. For example, we can hypothesize that having a

clinical course is going to be related to achieving a higher MHL score.

Nabil Salehiyan

If we run a bootstrap analysis, dimension 2 is the only one that differs from the raw data. Many new variables become

significant contributors if we want to replicate this study.

When looking at the factor score maps for all the variables, it’s very busy and we also can’t tell which variables are

true averages or not. To decide, we must compare it to the cosine circle. The true averages are as follows:

“AgeCat18to22”, “RaceCarBl/Af/Am”, “ClinCourse.NoClin”, and possibly “MajorHu/So”. The only variables that

are well explained by two dimensions are “RaceCat.Multi”, “ClinCourse.Clin”, “MajorCat.Psyc”, “RaceCat/Wh/Ca”,

“ClinCourse.NoClin”, “MajorCat.STEM”, “MajorCatHu/So”, and “AgeCat.18to22”. The rest of the variables are

better off to be examined in their respective dimensions.

Nabil Salehiyan

For our supplementary variables, “Male” makes up the negative end of dimension one and the positive end of

dimension two. The opposite is true for “Female”. We might be able to assume which gender scores higher on the

MHL test.

Summary

Through this MCA analysis we can conclude that a high MHL score depends on multiple qualitative variables.

Our supplementary variable plot suggests that females tend to score higher MHL scores compared to males but with

a very low eigenvalue, we can’t confidently assume this.

What we can assume is that taking a clinical course is correlated with having high MHL scores. Also, major also

impacts a student’s scores with STEM, humanities/social science, economy, and applied health science being

associated to low MHL scores opposed to psychology and education majors being associated with higher MHL scores.

As far as demographics, it can be assumed that those students who are female, white, and above the age of 28 score

higher on the MHL test than males who are black/African American, Hispanic/Latino, Asian, and Native Americans,

all between the ages of 18 to 22 who seem to generally score lower on the MHL test. Due to the overlapping intervals,

we cannot randomly assign an MHL score to any single major, although their averages do differ. Although we do see

some high correlation in this analysis, the studies design, logistical issues, and low eigenvalues, tell us that we

shouldn’t assume any definite causal relationships.

Nabil Salehiyan

Discriminant Correspondence Analysis

Discriminant correspondence analysis is a method for analyzing qualitative, categorical, or nominal data. The same

steps for CA and MCA apply to this (such as disjunctive coding), where we plot our data to see the distributional

equivalence. After we get our barycenter data, we plot our observations and are able to see the relationship between

the variables. This method is exactly like BADA except we are not using quantitative data.

Data

The same MHL data will be used for this analysis, where mental health literacy was tested. The variables were race

(“RaceCat”), age (“AgeCat”), whether a student has taken a clinical course (“ClinCourse”), major (“MajorCat”), and

the students experience with mental health. The supplementary variables are gender. Our observations will be the

MHL score (“low, med, high”).

Methods

The eigenvalue scree plot only has two dimensions, these are going to be the ones that we look at.

Nabil Salehiyan

For the groups MHL scores, we see a difference here when compared to MCA on dimension 2. Dimension 1 still

separates the low and high scores but now we see a slight separation between low/high and medium scores on the

second dimension. The confidence intervals also show us that the mean scores are reliably different from each other.

When looking at the tolerance intervals, we cannot reliably assign a score to a group due to the overlapping hulls.

In our factor score map, it is hard to determine which variables are significant on the first dimension without looking

towards the bar plot. It is safe to assume that ClinCourse.Clin, RaceCat.Wh/Ca, MajorCat.Psyc, and AgeCat.28Plus

is on the positive end and ClinCourse.NoClin, MajorCat.STEM, MajorCat.Econ, and RaceCat.His/La are on the

negative end. These scores show which variables tend to correlate to higher MHL scores, respectively.

Nabil Salehiyan

In our contribution bar plot we see that the first dimension seperates ClinCourse.Clin, MajorCat.Psyc, and

RaceCat.Wh/Ca on the positive end and ClinCourse.NoClin, MajorCat.Econ, and MajorCat.STEM on the negative

end. This confirms my assumptions from the factor score map. When a bootstrap analysis is run on this dimension,

almost every level of the variables become significant contributors- which tells us what we need to consider if we

want to replicate this experiment with an infinite population.

Nabil Salehiyan

The second dimension seperates MajorCat.Educ, and MajorCat.Psyc on the positive end from MajorCat.Econ on the

negative end, giving us more detail than our factor score map as to which contributions are important. A bootstrap

analysis for this dimension leaves only MajorCat.Psych, and MajorCat.Educ as significant contributors, telling us

that these are the only two to consider on this dimension for replication.

Nabil Salehiyan

To see which levels of these variables contribute more to our analysis, we can look at this variable contribution map.

It is shown that the first dimension consists of AgeCat, Experience, and ClinCourse while the second dimension

consists of MajorCat, ClinCourse, AgeCat, and Experience.

For the confusion matrices, there is a slightly higher accuracy for the fixed results compared to the LOO results.

Here we see when scores are predicted correctly vs incorrectly. Theres not much of a difference between these two

methods but overall, a low prediction accuracy for DiCA. This leads me to think either more data is needed, or a

different analysis method would yield higher accuracy scores.

Summary

We can conclude through this analysis that those who major in psychology, education, have taken a clinical course,

and are white/Caucasian, and are above the age of 28 are more likely to have higher mental health literacy. Those

who have lower mental health literacy scores tend to have majors in economy, STEM, humanities/sociology, and

have not taken a clinical course. The accuracy scores tell us that we cannot reliably make these predictions are

further analysis, or data is needed for reliability.

Citation

Miles, Rona, et al. “Mental Health Literacy in a Diverse Sample of Undergraduate Students: Demographic,

Psychological, and Academic Correlates.” BMC Public Health, vol. 20, no. 1, 2020,

https://doi.org/10.1186/s12889-020-09696-0.

Nabil Salehiyan

Partial Least Squares Correlation

Partial Least Square Correlation is an analytical technique in which we normalize our data by rows instead of by

columns. In our analysis, we work with pancakes which means we have many variables and few observations all of

which are multicollinear. The goal of PLSC is to find the components that maximize the covariance between the

latent variables which are computed on quantitative matrices. This relates two tables to one another in order to find

commonality between. We The components are comprised of saliences/loadings and latent variables/factors. We

have pairs of latent variables from the data matrixes

Data

This data is comprised of 777 colleges and universities and many variables. The variables are grouped into “Private”

and “Public” schooling. The X-set of latent variables are related to costs while the Y-set is related more with

performance.

Apps: Number of applications received

Accept: Number of applications accepted

Enroll: Number of new students enrolled

Top10perc: Pct. new students from top 10% of H.S. class

Top25perc: Pct. new students from top 25% of H.S. class

F.Undergrad: Number of fulltime undergraduates

P.Undergrad: Number of parttime undergraduates

Outstate: Out-of-state tuition

Room.Board: Room and board costs

Books: Estimated book costs

Personal: Estimated personal spending

PhD: Pct. of faculty with Ph.D.'s

Terminal: Pct. of faculty with terminal degree

S.F.Ratio: Student/faculty ratio

perc.alumni: Pct. alumni who donate

Expend: Instructional expenditure per student

Grad.Rate: Graduation rate

Nabil Salehiyan

Methods

In the eigenvalue scree plot, judging by the kaiser line, we see one component as having almost all the variance. For

simplicity, I am going to look at the first two components which seems to explain about 98% of the variance.

Nabil Salehiyan

The important contributions for the Y latent variable dimension is made up of Grad.Rate, Terminal, PhD,

Top25perc, and Top10perc on the negative end and S.F. Ratio on the positive end.

A bootstrap analysis of the Y-set adds on P.undergrad, F.undergrad, Enroll, and Apps. Suggesting that these are

important variables to consider when replicating this experiment.

Nabil Salehiyan

The Important contributions for the X latent variable dimension seems to be only made up of Outstate, perc.alumni,

and Expend – all of which are on the negative end.

A bootstrap ratio of the X-set shows adds Room.Board on the negative end and Personal on the positive end,

suggesting that these are important variables to consider when replicating this experiment with an infinite

population.

Nabil Salehiyan

This latent variable map shows us that Private and Public are separated by the X-Latent variable set. It can also be

observed that the means and confidence intervals of Private and Public are significantly different. This suggests that

these two groupings of college students can reliably be separated from one another in terms of their means and

group assignment.

The correlation heat map can confirm many of the differences between our variables that we might expect. We can

see groupings of high positive correlations between variables such as Expend:Top10perc, Expend:Top25perc,

perc.alumni:Top10perc, perc.alumni:Top25perc and so on. Also the highest negative correlations can be observed in

the S.F.Ratio column and the Grad.Rate:Personal correlation.

Nabil Salehiyan

Summary

The visualization of this data tells us a lot about colleges and their groupings. The highest predictor of being in the

top 10 or 25% of students seems to be associated with instructional expenditure per student, room and board costs,

and out-of-state tuition. These students also seem to be the ones who donate the most as alumni. The greatest

predictor of a PhD student seems to also be related to most of these variables (Outstate, Room.Board, perc.alumni,

Expend). The same goes for those achieving a terminal degree, and those who have a high graduation rate, although

graduation rate seems to go down with a higher personal spending rate. Lastly, there is a high negative correlation

between the student/faculty ratio and instructional expenditure, percent the alumni donated, room and board costs,

and out-of-state tuition. This suggests a negative relationship when there are more students per faculty member.

As for the groupings of these colleges (public/private) there is a reliable difference. The latent variable map shows

that these variables are clearly separated between private and public, both in means and confidence intervals. This

suggests that there truly is a difference between students when they attend a private school compared to a public

school.

DiSTATIS

DiSTATIS is an analytical method for analyzing multiple tables of data (3 or more). In this method,

variables are whole data tables. We must find the best linear combinations of these data tables in

order to create new tables from the old ones. In DiSTATIS, the latent variables are called partial

projections- partial because if we combine them all, we get the whole latent variable. We have one

latent variable per original data table. In order to run DiSTATIS, we must derive the data through

multiple factor analysis and multidimensional scaling to gather our distance matrices. These distances

tables are then converted into pseudo-covariance tables using double centering.

Data

The data I was given is from a sorting task in which participants sorted Mexican beers (rows).

Participants are grouped by gender (M/W).

Participants: C1-C51

Nabil Salehiyan

Beers: Minerva PA, Cucapa Miel, Tempus Clasica, Tempus DM, Calavera MIS, Minerva Stout, St

Peters, Calavera APA, Tempus Dor, Jack, Patricia, Cucapa CH, 7 Barrios, Alebrije, Ramuri, Corona,

Indio, Victoria, Leon, Bohemia, Modelo, Heineken, Negra Modelo, Pacifico, Tecate, Bohemia Osc,

Guiness, Carolus, Sol, Noche Buena

The first scree plot shows that our variance is explained by one dimension, which is what I will

analyze. This strong first eigenvalue (high variance in the first group) tells us that people generally

agree on the sorting of the beers.

Nabil Salehiyan

The RV map between judges shows me the correlation between the different participants. The

diagonal consists of all 1’s which shows that each participant does not differ from themselves-

obviously. As for ones that do differ, judges C51 and C1 did not rate the beers the same whereas

judge C5 and C2 interpret the beers very similarly.

This RV factor map shows the difference between men and women on this sorting task. There seems

to be a general separation between genders. It looks as if men are more negative on the first

dimension and women are more on the positive end of this dimension. The confidence intervals tell

me that the mean of men and women do not differ and that we cannot reliably have group assignment.

The significance of the second dimension is only 5% so I will only consider the first dimension to

explain our data.

The compromised scree plot shows us the dimension for the products. We also see a big drop off for

variance with this scree plot, which tells us that these participants generally agree on the sorting of

Nabil Salehiyan

the beers. The compromise is the average of the sorting of products. For this we can look at two

dimensions to explain the variance.

The relationships between the rows (products) for beers looks like how we would expect it to after

looking at the scree plot. We see big chunks of correlation in the diagonals which tells us that there

was a general agreement for sorting these beers. It tells us that all these participants saw a positive

correlation between the first 15 beers compared to the first 15 beers, a negative correlation between

the last 15 beers to the first 15 beers, a negative correlation between the first 15 beers and the last 15

beers, and a positive correlation between the last 15 beers and the last 15 beers.

Nabil Salehiyan

This compromise factor map shows us the sub groupings between these products. There seems to be

3 clear sub-groups between the Mexican beers that likely is due to style of beer such as lager, light,

and dark. These groupings are separated on both dimensions. The second dimension only has 10%

significance whereas the first has 34%, so the separation of these groups on the first dimension

carries more weight.

The partial factor score compromise map shows us the difference between men and women’s ratings

on these products. Since the lines that are connected to the beers seem to all be the same length, we

can assume that for most of these beers- men and women rated them the same. Although there are a

couple beers in which the lines are different lengths such as Guiness (the one closest to the middle of

the graph), which shows a difference between women and men’s ratings on this beer.

Summary

We can make an inference that men and women generally do not rate Mexican beers much

differently from one another. From additional research it seems like the beers separated on the factor

score map differed in things like lightness and bitterness. The beers on the negative side of the first

dimension seem to be clean, crisp beers (except for Guiness which is a dark beer). The subgroup on

the positive end of the second dimension seem to be lighter/bright beers and the subgroup on the

negative end of the second dimension seem to be darker, more bitter beers. In general, judging by

the partial factor score map, men and women rated these beers the same except for a couple

exceptions such as Guiness and Heineken. The scree plots and heat maps also tell us that there was a

general agreement on the ratings of Mexican beers from these participants, with a high drop off in

eigenvalue variance and large correlation chunks in the row heat map

Nabil Salehiyan